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Abstract. We develop an algorithm for computing the solution of a large 
system of linear ordinary differential equations (ODEs) with polynomial in- 
homogeneity. This is equivalent to computing the action of a certain matrix 
function on the vector representing the initial condition. The matrix function 
is a linear combination of the matrix exponential and other functions related 
to the exponential (the so-called ip-functions). Such computations are the ma- 
jor computational burden in the implementation of exponential integrators, 
which can solve general ODEs. Our approach is to compute the action of the 
matrix function by constructing a Krylov subspace using Arnoldi or Lanczos 
iteration and projecting the function on this subspace. This is combined with 
time-stepping to prevent the Krylov subspace from growing too large. The 
algorithm is fully adaptive: it varies both the size of the time steps and the 
dimension of the Krylov subspace to reach the required accuracy with mini- 
mal effort. We implement this algorithm in the MATLAB function phipm and we 
give instructions on how to obtain and use this function. Various numerical 
experiments show that the phipm function is often significantly more efficient 
than the state-of-the-art. 

1. Introduction 

In recent years there has been a resurgence of interest in a class of numerical 
methods for the solution of ordinary differential equations (ODEs) known as expo- 
nential integrators. These are intended to be used on ODEs which can be split into 
a stiff linear part and a non-stiff nonlinear part. This splitting can be done once 
or several times, as need be. As their name suggests, exponential integrators use 
the matrix exponential and various related matrix functions, generally referred to 
as (^-functions, within the numerical integrator. The computational cost of expo- 
nential integrators is dominated by the need to evaluate these (^-functions, and this 
task is the subject of this paper. For a recent review of exponential integrators, we 
refer the reader to Minchev and Wright (2005). 

ODEs of the form exploited by exponential integrators often arise when semi- 
discretizing a partial differential equation. Typically, the matrix appearing in the 
linear part is large and sparse. For such matrices, Krylov subspace methods provide 
an extremely efficient means of evaluating the action of an arbitrary matrix func- 
tion on a vector, without the need to evaluate the matrix function itself. The use of 
Krylov subspace approximations for the action of matrix exponential was pioneered 
by several authors in the late eighties and early nineties, notably Friesner, Tucker- 
man, Dornblaser, and Russo (1989) and Gallopoulos and Saad (1992). Hochbruck 
and Lubich (1997) show that for particular classes of matrices, the convergence of 
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the action of the matrix exponential on a vector is faster than that for the solu- 
tion of the corresponding linear system. This suggests that exponential integrators 
can be faster than implicit methods, because computing the action of the matrix 
exponential and solving linear systems are the two fundamental operations for ex- 
ponential integrators and implicit methods, respectively. This analysis breathed 
life back in the class of exponential integrators invented in the early sixties, which 
had been abandoned in the eighties due to their excessive computational expense. 

In their well-known paper on computing the matrix exponential, Moler and Van 
Loan (2003) write: "The most extensive software for computing the matrix expo- 
nential that we are aware of is expokit." This refers to the work of Sidje (1998), 
who wrote software for the computation of the matrix exponential of both small 
dense and large sparse matrices. The software uses a Krylov subspace approach in 
the large sparse setting. Computing the action of the matrix exponential is equiva- 
lent to solving a linear ODE, and Sidje uses this equivalence to apply time-stepping 
ideas from numerical ODE methods in expokit. The time-steps are chosen with 
the help of an error estimate due to Saad (1992). Sidje (1998) extends this approach 
to the computation of the first ^-function, which appears in the solutions of linear 
ODEs with constant inhomogeneity. This was further generalized by Sofroniou and 
Spaletta (2007) to polynomial inhomogeneities (or, from another point of view, to 
general 92- functions); their work is included in the latest version of mathematica. 

In all the approaches described above the size of the Krylov subspace is fixed, 
generally to thirty. Hochbruck, Lubich, and Sclhofer (1998) explain in their land- 
mark paper one approach to adapt the size of the Krylov subspace. In this paper, 
we develop a solver which combines the time-stepping ideas of Sidje (1998) and 
Sofroniou and Spaletta (2007) with the adaptivity of the dimension of the Krylov 
subspace as described by Hochbruck, Lubich, and Selhofer (1998). We do not ex- 
amine the case of small dense matrices; this has been studied by Koikari (2007) and 
Skaflestad and Wright (2009). We do however take this opportunity to refer the 
reader to the paper by Niesen and Wright (2009), which use the algorithm described 
in this paper as the kernel for an efficient implementation of certain classes of expo- 
nential integrators. We also recommend the recent book by Higham (2008), which 
discusses many of the issues related to matrix functions and their computation. 

The approach followed in this paper, reducing large matrices to smaller ones 
by projecting them on Krylov subspace, is not the only game in town. Other 
possibilities arc restricted-denominator rational Krylov methods (Moret 2007) , the 
real Leja point method (Caliari and Ostermann 2009), quadrature formulas based 
on numerical inversion of sectorial Laplace transforms (Lopez-Fernandez 2008), and 
contour integration (Schmelzer and Trefethen 2007). These methods are outside 
the scope of this paper, but we intend to study and compare them in future work. 

The outline of this paper follows. In Section 2 we present several useful results 
regarding the 99-functions. The algorithm we have developed is explained in Sec- 
tion 3, where we present the Krylov subspace method, show how error estimation, 
time-stepping and adaptivity are handled in our algorithm, and finally give some 
instructions on how to use our implementation. Several numerical experiments are 
given in Section 4 followed by some concluding remarks and pointers towards future 
work in Section 5. 
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2. The ^-functions 

Central to the implementation of exponential integrators is the efficient and 
accurate evaluation of the matrix exponential and other ^-functions. These ip- 
functions are defined for scalar arguments by the integral representation 

(1) ^o(z) = e*, Mz) = j I 1 Jy J o 1 e (1 - e)z O e - 1 de, £ = 1, 2, . . . ,z e C. 
For small values of £, these functions are 



i ~2 



c z -1 , „ e z -l-z , , e z - 1 - z - ±z 



The (^-functions satisfy the recurrence relation 

(2) <pt{z) = z<p t+ i(z) + j v £=1,2,.... 

The definition can then be extended to matrices instead of scalars using any of the 
available definitions of matrix functions, such as that based on the Jordan canonical 
form (Horn and Johnson 1991; Higham 2008). 

Every stage in an exponential integrator can be expressed as a linear combination 
of (^-functions acting on certain vectors: 

(3) <Po(A)v + (p 1 (A)v 1 + f 2 (A)v 2 H h f p (A)v p . 

Here p is related to the order of the exponential integrator and A is a matrix, often 
the Jacobian for exponential Rosenbrock-type methods or an approximation to it 
for methods based on the classical linear/non-linear splitting; usually, A is large 
and sparse. 

We need to compute expressions of the form (3) several times in each step that 
the integrator takes, so there is a need to evaluate these expressions efficiently and 
accurately This is the problem taken up in this paper. We would like to stress that 
any procedure for evaluating (3), such as the one described here, is independent 
of the specific exponential integrator used and can thus be re-used in different 
integrators. The exponential integrators differ in the vectors vq, . . . , v p appearing 
in (3). 

The following lemma gives a formula for the exact solution of linear differential 
equations with polynomial inhomogeneity. This result partly explains the impor- 
tant role that (^-functions play in exponential integrators (see Minchev and Wright 
(2005) for more details). The lemma also provides the background for the time- 
stepping procedure for the evaluation of (3) which we develop in §3.3. 



Lemma 2.1. The solution of the non- autonomous linear initial value problem 
(4) u'(t) = Au(t) + ^2—v j+1 , u(t k )=u k , 

is given by 



3=0 J 



P-1 j t i-t 

U{t k + Tfc) = Lp (T k A)u k + ^2^2 , . k _ pv 1~l +1 (p e+1 (T k A)v j+1 , 

where the functions tpe are defined in (1). 
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Proof. Recall that ipo denotes the matrix exponential. Using ipo{{tk — t)A) as an 
integrating factor for (4) we arrive at 

f Tk p — i _|_ j. \j 

u(t k + T fe ) = ip (T k A)u k + (po(T k A) / ip {-sA) ^2 k -| k v 3+i ds 

j=o 3' 

= ip (T k A)u k + tp a {T k A) j <po(-sA) 7i?~ — K\ v i +1 ds - 

Now change the integration variable, s = 8r k , and apply the definition (1) of the 
(^-functions. 

P-l 3 t J-l 

= tp Q (T k A)u k + 7TZ ,/ t + VwK^)" 3 +i- 

□ 

3. The algorithm 

This section describes the details of our algorithm for evaluating expressions 
of the form (3) as implemented in the matlab function phipm. In the first part 
of this section we explain how Krylov subspace techniques can be used to reduce 
large matrices to small ones when evaluating matrix functions. An estimate of the 
error committed in the Krylov subspace approximation is essential for an adaptive 
solver; this is dealt with in the second part. Then we discuss how to split up the 
computation of the ^-functions into several steps. Part four concerns the possibility 
of adapting the Krylov subspace dimension and the size of the steps using the error 
estimate from the second part, and the interaction between both forms of adaptivity. 
Finally, we explain how to use the implementation provided in the phipm function. 

3.1. The basic method. We start by considering how to compute <p p (A)v, where 
A is an n x n matrix (with n large) and v e R™. This is one of the terms in (3). 
We will use a Krylov subspace approach for this task. 

The idea behind this Krylov subspace approach is quite simple. The vector 
ip p (A)v lives in R™, which is a big space. We approximate it in a smaller space of 
dimension m. This smaller space is the Krylov subspace, which is given by 

K m = span{u, Av, A 2 v, . . . , A m ~ 1 v}. 

However, the vectors A 3 v form a bad basis for the Krylov subspace because they 
point in almost the same direction as the dominant eigenvector of A; thus, the basis 
vectors are almost linearly dependent. In fact, computing these successive products 
is the power iteration method for evaluating the dominant eigenvector. Therefore, 
we apply the (stabilized) Gram-Schmidt procedure to get an orthonormal basis of 
the Krylov subspace: 

K m = span{t>i,t> 2 , ■ ■ -,v m }. 

Let V m denote the n-by-m matrix whose columns are v\, . . . , v m . Then the m-by-m 
matrix H m = V^AV m is the projection of the action of A to the Krylov subspace, 
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expressed in the basis {vi, . . . , v m }. The Arnoldi iteration (Algorithm 1) computes 
the matrices H m and V m , see Saad (1992). 



Algorithm 1 The Arnoldi iteration. 

hi,i = \\v\\; vi = v/hi t i 
for j = 1, . . . , m — 1 do 

w = Avj 

for i = 1, . . . , j do 

hi j — vfw; w — w — hijVi 
end for 

h 3+i,j = HI; = w / h 3+i,j 

end for 



It costs |(m 2 — m + l)n floating point operations (flops) and m products of 
A with a vector to compute the matrices H m and V m . The cost of one of these 
matrix-vector products depends on the sparsity of A; the straightforward approach 
uses 2Na flops where Na is the number of nonzero entries in the matrix A. 

The projection of the action of A on the Krylov subspace K m in the standard 
basis of R" is V m H m V^. We now approximate ip p (A)v by <p p (V m H m V£)v. Since 
v m v m = I m and V m Vj t v = v we have <p p (V m H m V£)v = V m <p p (H m )V£v. Finally 
V^v = 0ei, where (3 = ||u|| and e\ is the first vector in the standard basis. Taking 
everything together, we arrive at the approximation 

(5) ¥> P {A)v w /3V m <p p (H m )ei. 

The advantage of this formulation is that the matrix H m has size m-by-m and thus 
it is much cheaper to evaluate <p p (H m ) than <p p (A). 

The matrix H m is Hessenberg, meaning that the (i, j) entry vanishes whenever 
i > j + 1. It is related to the matrix A by a similarity (H m = V^AV m ). If A 
is symmetric, then H m is both symmetric and Hessenberg, which means that it is 
tridiagonal. In that case, we denote the matrix by T m , and we only need traverse 
the i-loop in the Arnoldi iteration twice: once for i = j — 1 and once for i = j. The 
resulting algorithm is known as the Lanczos iteration (Algorithm 2); see Trefethen 
and Bau (1997, Alg 36.1). It takes 3(2™ — l)n flops and m products of A with a 
vector to compute T m and V m when A is symmetric. 



Algorithm 2 The Lanczos iteration. 

ti,o = 0; v = 0; v\ = v/\\v\\ 
for j = 1, . . . , m do 

w — Avj] tjj — vjw 

^•-i = ll^ll; tj-hj = \\ w \\ 

Vj+l = w/tjj-l 

end for 



The Krylov subspace algorithm reduces the problem of computing ip p (A)v where 
A is a big n-by-n matrix to that of computing ip p (H m )ei where H m is a smaller 
m-by-m matrix. Skaflestad and Wright (2009) describe a modified scaling-and- 
squaring method for the computation of (p p (H m ). However, this method has the 
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disadvantage that one generally also has to compute ipo{H m ), . . . , ip p -i(H m ). It 
is usually cheaper to compute the matrix exponential exp(H m ) of a slightly larger 
matrix H m , following an idea of Saad (1992, Prop. 2.1), generalized by Sidje (1998, 
Thm. 1) to p > 1. Indeed, if we define the augmented matrix H m by 

H m e\ m rows 
/ p — 1 rows 
oj 1 row 

then the top m entries of the last column of exp(_ff m ) yield the vector Lp p (H m )e\. 
Finally, we compute the matrix exponential exp(H m ) using the degree-13 diagonal 
Pade approximant combined with scaling and squaring as advocated by Higham 
(2005); this is the method implemented in the function expm in matlab Ver- 
sion 7.2 (R2006a) and later. In contrast, expokit uses the degree-16 uniform 
rational Chcbyshev approximant for symmetric negative-definite matrices and the 
degree-6 diagonal Pade approximant for general matrices, combined with scaling 
and squaring. We choose not to deal with the negative-definite matrices separately, 
but consider this as a possible extension to consider at a later date. Koikari (2009) 
recently developed a new variant of the Schur-Parlett algorithm using three-by- 
three blocking; we have not yet found the time to evaluate this method. 

The computation of exp(H m ) using Higham's method requires one matrix divi- 
sion (costing |(m + p) 3 flops) and 6+ |~log 2 (||-ff m || i/5.37)~| + matrix multiplications 
(costing 2(m+p) 3 flops each), where \x] + denotes the smallest nonnegative integer 
larger than x. Thus, the total cost of computing the matrix exponential of H m is 
M(H m ) (m + p) 3 where 

(7) M(A) = y + 2 

Only the last column of the matrix exponential cxp(H m ) is needed; it is natural 
to ask if this can be exploited. Alternatively one could ask whether the computa- 
tion of exp(H m ) using the scaling- and-squaring algorithm can be modified to take 
advantage of the fact that H m is Hessenberg (Higham (2008, Prob. 13.6) suggests 
this as a research problem). The algorithm suggested by Higham (2005, Alg. 2.3) 
computes the matrix exponential to machine precision (Higham considers IEEE 
single, double and quadruple precision). The choice of the degree of the Pade 
approximation and the number of scaling steps is intimately connected with this 
choice of precision. However, we generally do not require this accuracy. It would be 
interesting to see if significant computational savings can be gained in computing 
the matrix exponential by developing an algorithm with several other choices of 
precision. 

3.2. Error estimation. Saad (1992, Thm. 5.1) derives a formula for the error in 
the Krylov subspace approximation (5) to (p p (A)v in the case p — 0. This result 
was generalized by Sidje (1998, Thm. 2) to p > 0. Their result states that 

oo 

(8) <fip(A)v - pV m ip p (H m )ei = (3 h m+ - L , m ef n <p p+1 (H m )e 1 A : >- p - 1 v m+1 . 

j-p+i 

The first term in the series on the right-hand side does not involve any multiplica- 
tions with the matrix A and the Arnoldi iteration already computes vector w m +i, 



(6) 



H m = 



log; 



Pill 

5.37 
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so this term can be computed without too much effort. We use this term as an 
error estimate for the approximation (5): 

(9) e = \\Ph m+1 , m e'^ l ip p+1 (H m )e 1 v m+1 \\ = P\h m+1>m \ [<Pp+i(H m )] mtl . 

Further justification for this error estimate is given by Hochbruck, Lubich, and 
Selhofer (1998). 

We also use this error estimate as a corrector: instead of (5), we use 

(10) <P P {A)v w i5V m ^ p (U m )ex + Ph m+lim ef n <P P +i{H m )e 1 v m+1 . 

If <p p +i(H m )ei is computed using the augmented matrix (6), then ip p (H m )e\ also 
appears in the result, so we only need to exponentiate a matrix of size m + p+ 1. 
The approximation (10) is more accurate, but e is no longer a real error estimate. 
This is acceptable because, as explained below, it is not used as an error estimate 
but only for the purpose of adaptivity. 

Sidje (1998) proposes a more accurate error estimate which also uses the second 
term of the series in (8). However, the computation of this requires a matrix- vector 
product and the additional accuracy is in our experience limited. We thus do not 
use Sidje's error estimate. 



3.3. Time-stepping. Saad (1992, Thm. 4.7) proves that the error of the approxi- 
mation (5) satisfies the bound 

(11) ||Error|| < ^^(l + o(l)) 

for sufficiently large m, where p(A) denotes the spectral radius of A. This bound 
deteriorates as p(A) increases, showing that the dimension m of the Krylov subspace 
has to be large if p(A) is large. Sidje (1998) proposes an alternative method for 
computing (pi(A)v when p(A) is large, based on time-stepping. This approach is 
generalized by Sofroniou and Spaletta (2007) for general (^-functions. 

The main idea behind this time-stepping procedure is that <p p (A)v solves a non- 
autonomous linear ODE. More generally, Lemma 2.1 states that the function 

(12) u(t) = (p (tA)v + tipi(tA)vi + t 2 ip 2 (tA)v 2 + ■■■+ t p ip p (tA)v p , 
is the solution of the differential equation 

£J>-1 

(13) u'{t) = Au{t) + Vl +tv 2 + --- + — v p , u(0) = v . 

We now use a time-stepping method to calculate u(t en d) for some t on( j € R. If we 
want to compute an expression of the form (3), we set t cn d — 1. Split the time 
interval [0, tend] by introducing a grid = to < t\ < . . . < t n = t C nd- To advance 
the solution, say from tk to tk+i, we need to solve the differential equation (13) 
with the value of u(tk) as initial condition. The relation between u(tk) and u(tk+i) 
is given in Lemma 2.1. Rearranging this expression gives 

(14) u(t k +i) = <po(TkA)u(t k ) + ^2T l k <pi(T k A)^2 

i=l j=0 '■' 
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where Tfc = tk+i — tk- However, we do not need to evaluate all the (^-functions. The 
recurrence relation ip q (A) = tp q+ i(A)A + ^1 implies that 

Vq {A)^^(A)A^+j2 7zhy Aj > <! = 0,1,..., p-1. 

3 =0 Vi^JI- 

Substituting this in (14) yields 

P- 1 T 3 

(15) u(t k+1 ) = T%ip p (T k A)w p + ^2 J J W 3^ 

3=0 3 ' 

where the vectors Wj are given by 

3 3-i t n 

Wj = A j u(t k ) + J2 A3 ~ l J^i+t' 3 = 0, 1, • • • ,p. 
i=i e=o ' 

This is the time-stepping method for computing (3). 

The computational cost of this method is as follows. At every step, we need 
to compute the vectors Wj for j = Q,...,p, the action of ip p (rkA) on a vector, 
p + 1 scalar multiplications of a vector of length n and p vector additions. The 
vectors Wj satisfy the recurrence relation 

v-3 t e 

(16) Wo=u(t k ) and w 3 = Awj-i +^^v j+t , j = l,...,p, 

£=0 

and hence their computation requires p multiplications of A with a vector, p scalar 
multiplications, and p vector additions. 

One reason for developing this time-stepping method is to reduce the dimension 
of the Krylov subspace. If the spectrum of A is very large, multiple time-steps 
may be required. We intend, in the future, to compare this approach with the 
approach described by Skaflestad and Wright (2009) which evaluates the matri- 
ces ipo(H m ), . . . , ip p+ i(H m ) directly. The latter method may have computational 
advantages, particularly if v , . . . ,v p are equal or zero (Niesen and Wright (2009) 
describe settings in which this happens). 

We choose an initial step size similar to the one suggested in expokit, except 
we increase the rather conservative estimate by an order of magnitude, to give 

_ ^0_ / TOL (m^±iy^+ 1 ^27r(m ave + 1) 

T ° H^Hoo ^ 4P|| oo || W0 ||oo 

where TOL is the user defined tolerance and m avo is the average of the input and 
maximum allowed size of the Krylov subspace. 

3.4. Adaptivity. The procedure described above has two key parameters, the di- 
mension m of the Krylov subspace and the time-step r. These need to be chosen 
appropriately. As we cannot expect the user to make this choice, and the optimal 
values may change, the algorithm needs to determine m and r adaptively. 

We are using a time-stepping method, so adapting the step size r is similar 
to adaptivity in ODE solvers. This has been studied extensively. It is described 
by Butcher (2008, §39) and Hairer, N0rsett, and Wanner (1993, §11.4), among 
others. The basic idea is as follows. We assume that the time-stepping method has 
order q, so that the error is approximately Cr q+1 for some constant C. We somehow 
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compute an error estimate e and choose a tolerance TOL that the algorithm should 
satisfy. Then the optimal choice for the new step size is 

/nv^+D t || £ || 

(17) Tncw = Tk I — where to = 



oj ) T k - TOL ' 

The factor t cn d is included so that the method is invariant under time scalings. 
Usually, a safety factor 7 is added to ensure that the error will probably satisfy the 
error tolerance, changing the formula to r now = Tk('f/uj) 1 ^ q+1 \ Common choices 
are 7 = 0.25 and 7 = 0.38. However, in our case the consequence of rejecting a 
step is that we computed the matrix exponential in vain, while in ODE solvers the 
whole computation has to be repeated when a step is rejected. We may thus be 
more adventurous and therefore we take 7 = 0.8. 

In our scheme, the error estimate e is given by (9). However, what is the or- 
der q for our scheme? The a priori estimate (11) suggests that the order equals 
the dimension m of the Krylov subspace, but experiments show that this is too 
optimistic. We can estimate the order if we have attempted two step sizes during 
the same step, which happens if we have just rejected a step and reduced the step 
size. The estimated order is then 

fia\ - log r- log T i d 1 

18 ) i = ] — in — i — ii — ii -1 ' 

log ||q| - log||e id|| 

where e and e id denote the error estimates produced when attempting step size r 
and Toid, respectively. In all other cases, we use the heuristic q = \m. With this 
estimate for the order, we compute the suggested new step size as 



(19) T ncw = T k ( — ^ 



1/(9+1) 



The other parameter that we want to adapt is the Krylov subspace dimension m. 
The error bound (11) suggests that, at least for modest changes of m, the error is 
approximately equal to Ck~" 1 for some values of C and k. Again, we can estimate 
k if we have error estimates corresponding to two different values of m: 

/ II =-11 \ l/( m old-m) 

If this formula cannot be used, then we take k — 2. Given this estimate, the 
minimal m which satisfies the required tolerance is give by 

log u — log 7 



(21) m new = m 



logk 



We now have to choose between two possibilities: either we keep m constant and 
change r to T ncw , or we keep t constant and change m to m ncw . We will pick the 
cheapest option. To advance from tk to tk+i, we need to evaluate (15). Computa- 
tion of the vectors Wj requires 2(p — 1)(A^ + n) flops. Then, we need to do m steps 
of the Arnoldi algorithm, for a cost of |(m 2 — m + l)n + 2itiNa flops. If A is sym- 
metric, we will use the Lanczos algorithm and the costs drops to 3(2m — l)n+2mNA 
flops. To compute ip p (TkA)w p in (15) using (10), we need to exponentiate a matrix 
of size m + p + 1, costing M(H m ) (m + p + l) 3 flops with M(H m ) given by (7). 
Finally, the scalar multiplications and vector additions in (15) requires a further 
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(2p + l)n flops. All together, we find that the cost of a single step is 
(22) 

[ (m + p)N A + 3{m + p)n + M(H m ) {m + p + l) 3 , for Lanczos; 

Ci (m) = < 

[ (to + p)Na + (m 2 + 3p + 2)n + M (if m ) (m + p + l) 3 , for Arnoldi. 

This needs to be multiplied with the number of steps required to go from the current 
time tk to the end point t = i nd- So the total cost is 

(23) C(r,m) = 

We compute C(r ncw , to) and C(r, TO ncw ) according to this formula. If C(t dcw , to) is 
smaller, then we change the time-step to r ncw and leave to unchanged. However, to 
prevent too large changes in r wc restrict it to change by no more than a factor 5. 
Similarly, if C(t dcw , to) is smaller, then t remains as it is and we change to to m DCW , 
except that we restrict it to change by no more as a factor | (this factor is chosen 
following Hochbruck, Lubich, and Sclhofer (1998)). 

Finally, the step is accepted if u> > 6 where S = 1.2. Thus, wc allow that 
the tolerance is slightly exceeded. The idea is that our adaptivity procedure aims 
to get u> down to 7 = 0.8, so that usually we stay well below the tolerance and 
hence we may permit ourselves to exceed it occasionally. The resulting algorithm 
is summarized in Algorithms 3 and 4. 



^end tk 



dim). 



Algorithm 3 Computing the linear combination (3). 

t = 0; k = 0; y k = v 
t = 1; to = 1 
repeat 

Compute w , ■ ■ ■ ,w p according to (16) 
repeat 

Compute H m and V m using Algorithm 1 or 2 

F = approximation to ip p (tA)w p given by (10) 

e = error estimate given by (9) 

Compute co according to (17) 

Compute T ncw and m ncw using Algorithm 4 

Compute C(T ncw ,TO) and C(r, TO nC w) according to (7), (22) and (23) 
if C(T nC w,m.) < C(t, TO n cw) then 

t = min{ max {Tncw, |r} , 5t, 1 — t } 
else 

to = min{max{TO n cw, L| TO J> l}> l~t m l} 
end if 
until lo < S 

Compute Uk+i according to (15) 

t = t + t; k = k + 1 
until t = 1 
return y k 



3.5. The matlab function phipm. The algorithm described above is implemented 
in a matlab function called phipm. In terms of computing power, the system 
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Algorithm 4 Computing r ncw and m ncw . 

if previous step was rejected and r was reduced then 

Compute q according to (18) 
else if previous step was rejected and q was computed in previous step then 

Keep old value of q 
else 

Q = \ m 
end if 

Compute T ncw according to (19) 

if previous step was rejected and m was reduced then 

Compute k according to (20) 
else if previous step was rejected and k was computed in previous step then 

Keep old value of k 
else 

k = 2 
end if 

Compute m new according to (21) 



requirements are modest. Any computer capable of running a moderately up-to- 
date version of MATLAB is sufficient. 

A call of the phipm function has the form 

[u, stats] = phipm(t, A, v, tol, symm, m) 

There are three mandatory input arguments and one mandatory output argument; 
the other arguments are optional. The first input argument is t, the final time, 
t on d, for the differential equation (13). This is generally chosen to be t = 1 because 
the solution (12) of (13) at t = 1 equals the linear combination (3). The second 
argument is the n-by-n matrix argument of the (^-functions. The phipm function 
can also be used without forming the matrix A explicitly, by setting the argument 
A to a function which, given a vector v, computes Av. Finally, v is an n-hy-(p + 1) 
matrix with columns representing the vectors v , v\ , . . . , v p to be multiplied by 
the corresponding (^-functions. There is one mandatory output argument: u, the 
numerical approximation to the solution of (13) at the final time tend- 
There are three optional input arguments. The first one is tol, the toler- 
ance TOL in (18). The default tolerance is 10~ 7 . Then comes symm, a boolean 
indicating whether A is symmetric (symm=l) or not (symm=0). If not supplied, the 
code determines itself whether A is symmetric if the matrix is passed explicitly, and 
assumes that A is not symmetric if the matrix is given implicitly. The final input 
argument is m, the initial choice for the dimension of the Krylov subspace. The ini- 
tial choice is m = 1 by default. There is also one optional output argument: stats, 
for providing the user with various statistics of the computation. It is a vector with 
four entries: stats (1) is the number of steps needed to complete the integration, 
stats (2) is the number of rejected steps, stats (3) is the number of matrix- vector 
products, and stats (4) is the number of matrix exponentials computed. 

4. Numerical experiments 

In this section we perform several numerical experiments, which illustrate the 
advantages of the approach that has been outlined in the previous sections. We 
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compare the function phipm with various state-of-the-art numerical algorithms. 
The first experiment compares several numerical ODE solvers on a large system of 
linear ODEs, resulting from the finite-difference discretization of the Heston PDE, a 
common example from the mathematical finance literature. The second experiment 
compares our function phipm and the expv and phiv functions from expokit on 
various large sparse matrices. This repeats the experiment reported by Sidje (1998). 

All experiments use MATLAB Version 7.6 (R2008a), which computes the matrix 
exponential as described by Higham (2005). 

4.1. The Heston equation in financial mathematics. A European call option 
gives its owner the right (but not obligation) to buy a certain asset for a certain 
price (called the strike price) at a certain time (the expiration date). European call 
options with stochastic volatility, modeled by a stochastic mean-reverting differen- 
tial equation, have been successfully priced by Heston (1993). The Heston pricing 
formulae are a natural extension of the celebrated Black-Scholes-Merton pricing 
formulae. Despite the existence of a semi-closed form solution, which requires the 
numerical computation of an indefinite integral, the so-called Heston PDE is often 
used as a test example to compare various numerical integrators. This example 
follows closely In 't Hout (2007) and In 't Hout and Foulon (2009). 

Let U (s, v, t) denote the European call option price at time T — t, where s is 
the price of the underlying asset and v the variance in the asset price at that 
time. Here, T is the expiration date of the option. Hcston's stochastic volatility 
model ensures that the price of a European call option satisfies the time-dependent 
convection-diffusion- reaction equation 

U t = ^vs 2 U ss + p\vsU vs + ^X 2 vU vv + (r d - r f )sU s + k(t] - v)U v - r d U. 

This equation is posed on the unbounded spatial domain s > and v > 0, while t 
ranges from to T. Here, p G [—1,1] represents the correlation between the Wiener 
processes modeling the asset price and its variance, A is a positive scaling constant, 
r d and r / are constants representing the risk- neutral domestic and foreign interest 
rates respectively, r\ represents the mean level of v and k the rate at which v reverts 
to rj. The payoff of the European call option provides the initial condition 

U(s, v, 0) = max(s — K, 0), 

where K > denotes the strike price. 

The unbounded spatial domain must be restricted in size in computations and 
we choose the sufficiently large rectangle [0, S] x [0, V]. Usually S and V are chosen 
much larger than the values of s and v of practical interest. This is a commonly 
used approach in financial modeling so that if the boundary conditions are imperfect 
their effect is minimized, see Tavella and Randall (2000, p. 121). Suitable boundary 
conditions for < t < T are 

U(0,v,t) = 0, 

U(s,V,t) = s, 

U s {S,v,t) = l, 

U t (s,0,t) - (r d -r f )sU s (s,0,t) - nr]U v (s, 0, t) + rU(s, 0, t) = 0. 

Note that the boundary and initial conditions are inconsistent or non-matching, 
that is the boundary and initial conditions at t = do not agree. 
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We discretize the spatial domain using a uniform rectangular mesh with mesh 
lengths As and Av and use standard second-order finite differences to approximate 
the derivatives as follows: 





_ Ui+i,j - Ui-ij 


2As 


(U ss )i,j s 


Ui+i.j — 2Uij + Ui-ij 


As 2 




Uij+i — Uij-i 


2Av 


(U v )ifi £ 


_ -3U ifi + 4£/ M - U i>2 


2Av 


(U vv )ij p 


Uij+i — %Uij + Uij-i 


Av 2 




Ui+ij+i + Ui-ij-i — Ui 


4AvAs 



The boundary associated to v = is included in the mesh, but the other three 
boundaries are not. Combining the finite-difference discretization and the boundary 
conditions leads to a large system of ODEs of the form 

u'(t) = Au(t) +vi, u(0) = v . 

The exact solution for this system is given in Lemma 2.1. This is a natural problem 
for the phipm solver. 

We compare six methods: the scheme of Crank and Nicolson (1947), three Al- 
ternating Direction Implicit (ADI) schemes, ode 15s from MATLAB and the phipm 
method described in this paper. The three ADI schemes are the method due to 
Douglas and Rachford (1956) with 6 — j, the method of Craig and Sneyd (1988) 
with 9 = a = i, and the method of Hundsdorfer and Verwer (2003) with & = jq 
and \x = \. The first ADI method is of order one and the other two are of order 
two. In 't Flout (2007) lists each of the ADI methods described above and explains 
necessary implementation details; the stability of these methods is discussed in In 't 
Hout and Welfert (2009). To compare phipm with an adaptive solver we choose the 
in-built MATLAB solver odel5s which is a variable step size, variable order imple- 
mentation of the backward differentiation formulae (BDF). The Jacobian of the 
right-hand side (that is, the matrix A) is passed to odel5s; this makes odel5s 
considerably faster. 

We choose the same problem parameters as Case 2 in the paper by In 't Hout and 
Foulon (2009): namely k = 3, v = 0.12, A = 0.04, p = 0.6, r d = 0.01, r f = 0.04, 
the option maturity T = 1 and the strike price K = 100. The spatial domain 
is truncated to [0,8^] x [0,5]. We use a grid with 100 points in the s-direction 
and 51 points in the w-direction (recall that v = is included in the grid and the 
other borders are not). This results in a system with 5100 degrees of freedom, with 
nnz = 44, 800 non-zero elements. Figure 1 shows the error of the solvers against 
the CPU time. The Crank-Nicolson and ADI schemes are run with the step size 
decreasing in powers of two from 2~ 8 to 2~ 16 , while phipm and odel5s are run with 
the tolerance decreasing geometrically from 100 to 10~ 3 and from 10 _1 to 10~ 6 , 
respectively. The error is computed by comparing the numerical solution against 
the "exact" solution, as computed by phipm with a tolerance of 10~ 10 . We measure 
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cpu time 

Figure 1. Plots of error against CPU time for the system of 
ODEs from the discretized Heston PDE. The three ADI schemes 
are the represented by the: green line (Douglas); red line (Craig 
and Sneyd); blue line (Hundsdorfer and Verwer), the cyan line is 
Crank-Nicolson, the magenta line is ode 15s and the black line is 
phipm. 

the maximum error at time t = T on the set [0, 2K] x [0, 1], which is smaller than 
the computational domain. 

The results show that for this example, phipm is the best algorithm for tolerances 
of interest to practitioners (an error of 10 -4 corresponds to one basis point), even 
outperforming the carefully tuned ode 15s. The phipm function does produce more 
accurate than requested results in this experiment, which could lead to a loss of 
efficiency, when embedding this function in a larger solver as done in (Niesen and 
Wright 2009). The situation is not as clear for other parameter values listed in In 't 
Hout and Foulon (2009), with the cross-over point closer to 10~ 6 . We have also 
noticed that Figure 1 depends on the computer's operating system but the overall 
nature of the plot remains the same. 

Figure 2 shows how the Krylov subspace size (top graph) and the step size 
(bottom graph) change during the integration interval. They both vary during the 
integration, which shows that the adaptivity presented in Section 3.4 is effective. 
Figure 3 plots the error estimate and the actual error during the integration interval. 
The error estimate is always larger than the actual error in this experiment. 

One worrying aspect in these figures is that the step size sequence in Figure 2 zig- 
zags and that the estimated error in Figure 3 varies rather wildly from step to step. 
This might indicate stability problems, perhaps caused by augmenting the matrix 
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Figure 2. Plots showing how the dimension m of the Krylov sub- 
space and the step size r change during the integration of the dif- 
ferential equation (13), as solved by phipm with a tolerance of 10 -4 . 
The green crosses represent rejected steps. 




0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 



f 

Figure 3. The error estimate (blue) and the actual error (red). 
Only accepted steps are shown. Again, the tolerance is 10~ 4 . 
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in (6) or by the matrix- vector multiplications in (16). We intend to investigate 
this further in future work; perhaps the use of control theory techniques can be 
useful in smoothing the error estimate and choice of Krylov subspace size. Also, 
given that this equation has a semi-closed form solution, the application to more 
exotic options, such as barrier, American and Asian options, is of more practical 
significance and in our minds for future work. 

Finally, we draw attention to the fact that the linear algebra in odel5s the 
Crank-Nicolson and all the ADI methods, is performed using highly optimized 
routines written in a low-level language, whereas all the computations in phipm are 
done in the native matlab language. 

4.2. Comparisons between phipm and the functions in expokit. The Univer- 
sity of Florida sparse matrix collection, compiled by Davis (2007), is an excellent, 
well-maintained website containing various classes of sparse matrices. All the sparse 
matrices that we use in this subsection are available from that website. 

Experiment 1. We compute the action of the matrix exponential of four different 
sparse matrices described below on certain vectors vq. We use the expv and phiv 
routines from expokit and the phipm routine described in this paper to compute 
e tA Vo (the phiv routine computes e tA vo + (pi(tA)v\, so by setting v\ = this code 
can also be used to compute e tA Vo). The expv and phiv routines are implemented 
in MATLAB like the phipm routine, but they do not use the Lanczos algorithm when 
A is symmetric. Therefore, we implemented a variant of phipm which computes 
(12) using a Krylov subspace of fixed dimension m = 30. This variant is called 
phip. It can be considered as an extension of the phiv code to symmetric matrices 
and p > 1 and similar to the code developed by Sofroniou and Spaletta (2007) . 
The four matrices we consider are as follows: 

• The first matrix, orani678 from the Harwell-Boeing collection (Duff, Grimes, 
and Lewis 1989), is an unsymmctric sparse matrix of order n = 2, 529 with 
nnz — 90, 158 nonzero elements. We choose t = 100, v = [1,1,..., 1] T and 
TOL = y/e, where e = 2~ 52 denotes the machine epsilon. 

• The second example is bcspwrlO, also from the Harwell-Boeing collection. 
This is a symmetric Hermitian sparse matrix of order n = 5, 300 with 
nnz = 21, 842. We set t = 10, v = [l,0,..., 0, 1] T and TOL = 10~ 5 . 

• The third example, gr_30_30, again of the Harwell-Boeing collection, is a 
symmetric matrix arising when discretizing the Laplacian operator using a 
nine-point stencil on a 30 x 30 grid. This yields an order n — 900 sparse 
matrix with nnz = 7, 744 nonzero elements. Here, we choose t = 2 and 
compute e~ tA e tA v 0} where u = [1, 1, . . . , 1] T , in two steps: first the forward 
step computes w — e tA v n and then we use the result w as the operand 
vector for the reverse part e~ tA w. The result should approximate vq with 
TOL = 10~ 14 . 

• The final example uses helm2d03 from the GHS_indef collection (see (Davis 
2007) for more details), which describes the Helmholtz equation — A t Vm — 
IOOOOu = 1 on a unit square with Dirichlet u — boundary conditions. 
The resulting symmetric sparse matrix of order n = 392, 257 has nnz = 
2,741,935 nonzero elements. We compute e tA vo + t(p±(tA)vi, where t = 
2 and uo = v\ — [1, 1, . . . , 1] T . This could alternatively be written as 
e B vo + (fii(B)wi, where B = tA and w\ — tv\; however, phiv gives a 
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Table 1. Comparisons of the average speedup of phiv, phip and 
phipm relative to expv and the relative errors on four matrices 
taken from the University of Florida sparse matrix collection. 





orani678 


bcspwrlO 


gr_30_30 


helm2d03 


code 


speed error 


speed error 


speed error 


speed error 


expv 
phiv 
phip 
phipm 


1 5.2 x 1CT 11 
1.16 1.8 x 1CT 10 
0.73 4.6 x 1CT 12 
2.49 2.8 x 1CT 12 


1 9.2 x 10" 8 
0.99 6.5 x lO- 8 
1.76 2.6 x 10~ 8 
3.19 1.5 x 10" 9 


1 1.0 x 10" 7 
0.98 2.1 x 10~ 7 
2.72 8.3 x 10~ 7 
2.10 4.3 x 10~ 6 


1 4.3 x 10~ 8 
2.35 2.0 x 10" 8 
3.35 1.5 x 10~ 9 



Table 2. Comparisons of the average speedup of phipm relative 
to phip and the relative errors on four large sparse matrices taken 
from the University of Florida sparse matrix collection. 





orani678 


bcspwrlO 


gr_30_30 


helm2d03 


code 


speed error 


speed error 


speed error 


speed error 


phip 
phipm 


1 6.9 x 10" 12 
3.07 7.9 x 10" 12 


1 1.3 x 10" 4 
2.40 5.3 x 10" 5 


1 9.9 x 10" 12 
17.23 2.3 x 10" 12 


1 3.1 x 10" 1U 
1.51 1.2 x 10~ 8 



different result for this formulation because it uses error per unit step, 
instead of the more consistent error per step. In this test, only the codes 
phiv phip and phipm are compared, with TOL = ^fe. 

In the first three comparisons we measure the average speedup of each of the codes 
relative to expv; in the final comparison we measure relative to phiv. We ran 
the comparisons 100 times to compute the average speedup. We summarize our 
findings in Table 1. 

Experiment 2. In these computations we evaluate f (tA)vo + t(pi(tA)vi + • • • + 
t 4 (fi4(tA)v4, where vo = ■ ■ ■ = = [1, 1, . . . , 1] T , with the codes phip and phipm. 
This comparison gauges the efficiency gains achieved by allowing the Krylov sub- 
space size to vary. The implementations are identical except for the fact that 
phipm can vary to as well. We use the four sparse matrices described above, with 
the same values of t and TOL, except for the sparse matrix gr_30_30 where t = 20 
and TOL = y/e, is used and we only compute the forward part of the problem. We 
summarize our findings in Table 2. 

Discussion of the results. These comparisons show that in almost all cases the 
phipm code is more efficient, in some cases by a considerable margin. The only 
exception is the gr_30_30 matrix in Experiment 1, where phip (fixed to) performs 
better than phipm (adaptive to) . This example requires less computations than the 
others; the calculation lasts one-tenth of a second on our computer, as opposed to 
1.5-8 seconds for the other examples. In this case, the cost associated with adapting 
the dimension of the Krylov subspace is not negligible and in fact bigger than the 
performance gain. 

Summarizing, adapting both the dimension of Krylov subspace as well as the 
length of the time steps significantly increases overall efficiency. Given that in an 
implementation of an exponential integrator phipm would be called several times in 
a step over many steps during the integration, this increase in efficiency can often 
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lead to very large overall computational gains. We point the reader to Niesen and 
Wright (2009), where several numerical experiments using phipm in conjunction 
with exponential integrators are reported. 

5. Conclusion and future work 

The phipm function is an efficient routine which computes the action of linear 
combinations of ip- functions on operand vectors. The implementation combines 
time stepping with a procedure to adapt the Krylov subspace size. It can be 
considered as an extension of the codes provided in expokit and mathematica. 

The (/^-functions are the building blocks of exponential integrators. An imple- 
mentation of the algorithm in a lower-level language will be useful in this context; 
this is work on progress. We are also working on the implementation of exponential 
integrators which use the phipm routine described in this paper and hope to report 
on this shortly. 

We intend to improve the code over time. Some issues which we plan to investi- 
gate have already been mentioned. One of them is the issue of stability, especially 
in view of the error estimates in Figure 3 which might point to stability problems. 
Our choice to compute the ^-function f ^he reduced matrix H m by adding some 
rows and columns and then computing the matrix exponential may exacerbate any 
instabilities. Perhaps it is better to compute the ^-function of H m directly. This 
also allows us to exploit the fact that H m is symmetric if the matrix L in the orig- 
inal differential equation is symmetric, for instance by using rational Chebyshev 
approximants. In any case, regardless of whether we augment the matrix H m or 
not, we do not need to compute the matrix function to full precision. It should also 
be possible to exploit the fact that H m is Hessenberg. 

We also intend to modify phipm to take advantage of recent advances in parallel 
processing technology, specifically the use of graphics cards accessed using program- 
ming languages such as CUDA, which provide promise of significant computational 
improvements. Finally, as mentioned in the introduction, there are alternatives 
to the (polynomial) Krylov method considered in this paper. We plan to study 
other methods and compare them against the method introduced here. Competi- 
tive methods can be added to the code, because we expect that the performance of 
the various methods depends strongly on the characteristics of both the problem 
and the exponential integrator. 

Acknowledgements. The authors thank Nick Higham, Karel in 't Hout, Brynjulf 
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